{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/LC1332/Chat-Haruhi-Suzumiya/blob/main/notebook/gradio_with_Image.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Chat凉宫春日的Gradio图片版本\n",
        "\n",
        "[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)]()\n",
        "[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)]()\n",
        "\n",
        "**Chat凉宫春日**是模仿凉宫春日等一系列动漫人物，使用近似语气、个性和剧情聊天的语言模型，\n",
        "\n",
        "<details>\n",
        "  <summary> 本项目由李鲁鲁，冷子昂，闫晨曦，封小洋，scixing等开发。 </summary>\n",
        "\n",
        "李鲁鲁发起了项目，并完成了最早的版本，在多个微信群实现了测试。完成了GPT自动生成对话部分。\n",
        "\n",
        "冷子昂参与了早期Gradio的开发，并且参与了后端和前端的选型。debug和最终上线了app.py\n",
        "\n",
        "闫晨曦将李鲁鲁的notebook重构为app.py\n",
        "\n",
        "封小洋进行了中文转日文模型的选型，完成了针对台词抽取图片的工具，（将要完成）和haruhi图片分类器\n",
        "\n",
        "scixing实践了VITS语音，完成了台词对应的语音抽取，（将要完成）特定人物的语音分类器。\n",
        "\n",
        "贾曜恺正在实验一个带图片和语音的前端实现方案。\n",
        "\n",
        "</details>\n",
        "\n",
        "这个脚本是第一个图文版本的Chat凉宫春日，\n",
        "\n",
        "使用了闫晨曦重构的app.py， 冷子昂完成了debug并完成了图文的Gradio实现\n",
        "\n",
        "- [ ] 图片数据还在逐步增加中\n",
        "\n",
        "运行这个notebook你需要OpenAI的API Token\n",
        "\n",
        "项目链接 [https://github.com/LC1332/Prophet-Andrew-Ng](https://github.com/LC1332/Prophet-Andrew-Ng)\n",
        "\n",
        "骆驼先知是[Luotuo(骆驼)](https://github.com/LC1332/Luotuo-Chinese-LLM)的子项目之一，后者由李鲁鲁，冷子昂，陈启源发起。"
      ],
      "metadata": {
        "id": "GgB_5NPqbsDA"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!pip -q install transformers\n",
        "!pip -q install openai\n",
        "!pip -q install tiktoken\n",
        "!pip -q install langchain\n",
        "!pip -q install gradio"
      ],
      "metadata": {
        "id": "5mxafBYUWItg",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "d528fbed-666d-4f28-c918-a30acd88db6c"
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.2/7.2 MB\u001b[0m \u001b[31m72.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m236.8/236.8 kB\u001b[0m \u001b[31m26.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m122.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m73.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.6/73.6 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m41.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.5/114.5 kB\u001b[0m \u001b[31m10.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m28.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.6/149.6 kB\u001b[0m \u001b[31m13.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m38.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m32.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m9.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.1/49.1 kB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m20.0/20.0 MB\u001b[0m \u001b[31m52.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m57.0/57.0 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m288.3/288.3 kB\u001b[0m \u001b[31m26.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.4/75.4 kB\u001b[0m \u001b[31m8.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.5/50.5 kB\u001b[0m \u001b[31m6.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m137.0/137.0 kB\u001b[0m \u001b[31m16.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.7/45.7 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m129.9/129.9 kB\u001b[0m \u001b[31m15.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.0/67.0 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.5/72.5 kB\u001b[0m \u001b[31m8.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!pip -q install sentencepiece"
      ],
      "metadata": {
        "id": "b0qD3qvAHIT3",
        "outputId": "bdbfb67c-7b8b-498c-8edd-00f68673b1e8",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[?25l     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/1.3 MB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K     \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[91m╸\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m48.1 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m33.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "zxdkfrsgJUeU"
      },
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OfHGuLxnV6yI",
        "outputId": "82c62c73-7fc9-4a31-a188-d198140c93a4"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Cloning into 'Chat-Haruhi-Suzumiya'...\n",
            "remote: Enumerating objects: 1342, done.\u001b[K\n",
            "remote: Counting objects: 100% (115/115), done.\u001b[K\n",
            "remote: Compressing objects: 100% (88/88), done.\u001b[K\n",
            "remote: Total 1342 (delta 64), reused 50 (delta 27), pack-reused 1227\u001b[K\n",
            "Receiving objects: 100% (1342/1342), 45.52 MiB | 9.48 MiB/s, done.\n",
            "Resolving deltas: 100% (509/509), done.\n"
          ]
        }
      ],
      "source": [
        "!git clone https://github.com/LC1332/Chat-Haruhi-Suzumiya.git"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "os.environ[\"OPENAI_API_KEY\"] = 'sk-lfrdoJK'"
      ],
      "metadata": {
        "id": "Y0LgGKVxZ4oc"
      },
      "execution_count": 4,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "增加一个保存目录吧"
      ],
      "metadata": {
        "id": "3rT10D3YMabv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# from google.colab import drive\n",
        "# drive.mount('/content/drive')\n",
        "\n",
        "# save_path = \"/content/drive/MyDrive/GPTData/Haruhi-Generated-with-image/\""
      ],
      "metadata": {
        "id": "TpNIiTo2Mim-"
      },
      "execution_count": 9,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "%cd Chat-Haruhi-Suzumiya/src"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3s1LIpccV8xk",
        "outputId": "5a1c39ed-c2e1-4f96-fde5-795238a3bd0f"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "/content/Chat-Haruhi-Suzumiya/src\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "目前的版本embedding用CPU抽取，要等个3分钟"
      ],
      "metadata": {
        "id": "w-1kVViEPVXa"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!python app.py\n",
        "\n",
        "# 如果你想保存聊天记录的话 用这个\n",
        "# !python app.py --save_path \"/content/drive/MyDrive/GPTData/Haruhi-Generated-with-image/\""
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "GmXTN64iWFXM",
        "outputId": "6d2e10bc-d122-4ae2-ac76-1d07f2cd58b9"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "2023-06-13 14:26:34.346859: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
            "Downloading (…)lve/main/config.json: 100% 966/966 [00:00<00:00, 5.87MB/s]\n",
            "Downloading (…)solve/main/models.py: 100% 21.1k/21.1k [00:00<00:00, 81.5MB/s]\n",
            "A new version of the following files was downloaded from https://huggingface.co/silk-road/luotuo-bert:\n",
            "- models.py\n",
            ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n",
            "Downloading pytorch_model.bin: 100% 414M/414M [00:01<00:00, 254MB/s]\n",
            "文件夹 'Suzumiya' 创建成功！\n",
            "Downloading (…)lve/main/config.json: 100% 753/753 [00:00<00:00, 3.65MB/s]\n",
            "Downloading pytorch_model.bin: 100% 1.32G/1.32G [00:48<00:00, 27.0MB/s]\n",
            "Downloading (…)okenizer_config.json: 100% 299/299 [00:00<00:00, 1.74MB/s]\n",
            "Downloading spiece.model: 100% 1.51M/1.51M [00:00<00:00, 2.82MB/s]\n",
            "Downloading (…)cial_tokens_map.json: 100% 65.0/65.0 [00:00<00:00, 414kB/s]\n",
            "/usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:454: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.\n",
            "  warnings.warn(\n",
            "Downloading (…)okenizer_config.json: 100% 539/539 [00:00<00:00, 3.41MB/s]\n",
            "Downloading (…)solve/main/vocab.txt: 100% 110k/110k [00:00<00:00, 682kB/s]\n",
            "Downloading (…)/main/tokenizer.json: 100% 439k/439k [00:00<00:00, 1.78MB/s]\n",
            "Downloading (…)cial_tokens_map.json: 100% 125/125 [00:00<00:00, 769kB/s]\n",
            "/content/Chat-Haruhi-Suzumiya/src/app.py:329: UserWarning: You have unused kwarg parameters in Textbox, please remove them: {'placeholde': '输入角色名'}\n",
            "  role_name = gr.Textbox(label=\"角色名\", placeholde=\"输入角色名\")\n",
            "Running on local URL:  http://127.0.0.1:7860\n",
            "Running on public URL: https://722711ab1904dcc467.gradio.live\n",
            "\n",
            "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n",
            "history done\n",
            "2\n",
            "3\n",
            "['为什么剪头发', '不重要的事情', '电脑是怎么来的', '交往的男生', '谁来写网站', '春日与阿虚', '找管理员借钥匙']\n",
            "当前辅助sample: ['为什么剪头发', '不重要的事情', '电脑是怎么来的', '交往的男生', '谁来写网站', '春日与阿虚', '找管理员借钥匙']\n",
            "history done\n",
            "2\n",
            "3\n",
            "['为什么剪头发', '不重要的事情', '电脑是怎么来的', '春日与阿虚', '自己建一个社团就好啦', '开学第二天', '让阿虚帮忙建社团']\n",
            "当前辅助sample: ['为什么剪头发', '不重要的事情', '电脑是怎么来的', '春日与阿虚', '自己建一个社团就好啦', '开学第二天', '让阿虚帮忙建社团']\n",
            "Keyboard interruption in main thread... closing server.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "oi68gwSfO1uH"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}